Until now we’ve defined epipolar geometry which relates views from two cameras geometrically and if a point in one view is known it is solely a one-dimensional search for that point in the other image if the relative position of the two scenes is known.
To make life easy make some simplifying assumptions:
The correspondence problem: Multiple match hypotheses satisy the epipolar constraint but which is correct?
The verged case
We need some additional constraints. The epipolar constraint must be true as a result of camera geometry but there are other soft constraints that are likely going to be true:
To find matches in the pairs assume:
Dense Correspondence Search: For every pixel / window in the left image compare it with every pixel / window on the same epipolar line in the right image and pick the position with the minimum match cost (pick your flavor of sum of squared difference, normalized correlation, etc…)
This is a notional image of the disparity measured via sum of squared difference across the scan line.
Suppose one image is slightly brighter because the gains are off; direct subtraction, because of the scaling intensity, will produce error and might not produce the best match. You can do normalized correlation with SSD and you’ve enforced a similarity constraint.
import cv2
import numpy as np
from cs6476helpers import *
left = cv2.imread("images/18_04_01.PNG")
right = cv2.imread("images/18_04_02.PNG")
imshow(left)
imshow(right)
left_gray = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
right_gray = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)
patch_loc = (119, 169)
patch_size = (100, 100)
patch_left = left_gray[patch_loc[0]:(patch_loc[0] + patch_size[0]), patch_loc[1]:(patch_loc[1] + patch_size[1])]
imshow(patch_left)
strip_right = right_gray[patch_loc[0]:(patch_loc[0] + patch_size[0]),:]
imshow(strip_right)
def find_best_match(patch,strip):
rows = strip.shape[1] - patch.shape[1]
errors = []
for i in range(rows):
ssd = np.sum((patch - strip[:,i:(i+patch.shape[1])])**2)
errors.append(ssd)
errors = np.asarray(errors)
best_x = np.argmin(errors)
return(best_x)
best_x = find_best_x(patch_left,strip_right)
patch_right = strip_right[:,best_x:(best_x+patch_size[1])]
imshow(patch_right)
Two images with a horizontal epipolar line where things aren’t quite so clear cut. There will be ambiguities and false positives!
Following the same idea of sliding along the epipolar line searching for a match yields pretty good results when we do cross correlation!
What happens when the point falls in an area where the image is kind of textureless (white wall!).
Well, crap. That’s a lot of peaks. How do we justify picking one? How do we fix this? Our problem can be resolved by thinking about picking a larger window.
Which regions are good for stereo matching?
Basically anywhere with unique textures! Discard anything that’s too dark or too light.
How should we choose an effective window size? There’s no easy answer.
For a small window size in this image we can match branches pretty well but everything else is quite poor! Increasing the window size matches the ground pattern quite well but the tree branches lose performance.
Beyond the hard constraint let’s revisit the soft constraints of uniqueness and order.
Uniqueness says that there’s no more than one match, but why no more than. Why not one? Occlusion!
In this image you can see that as the sight line slides from right to left the green bar occluded portions of the right bar and different portions are visible in each image.
(These are half occluded because they are visible in one image. Fully occluded is not visible in either image.)
This is why there is rarely an exact one for one match.
Typically a single solid surface will follow the ordering constraint. What’s not a single solid?
Transparent surfaces with markings on them!
More commonly a narrow occluding surface will cause this.
import cv2
import numpy as np
from cs6476helpers import *
left = cv2.imread("images/18_04_01.PNG")
right = cv2.imread("images/18_04_02.PNG")
left_gray = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
right_gray = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)
y = 119
b = 100
strip_left = left_gray[y:y+b,:]
imshow(strip_left)
strip_right = right_gray[y:y+b,:]
imshow(strip_right)
def disparity(strip_left, strip_right, b):
num_blocks = strip_left.shape[1]//b
disparity = []
for i in range(num_blocks):
x_left = i * b
patch_left = strip_left[:,x_left:(x_left + b)]
x_right = find_best_match(patch_left,strip_right)
disparity.append(x_left-x_right)
return disparity
disparity(strip_left,strip_right,b)
A couple of slightly more sophisticated methods exist for solving the ordering problem. The previous methods are called ‘window-based matching’
Optimize correspondence assignments jointly. This can be done a scanline at a time or on a full 2D grid.
We want the best set.
The image plots a scanline from the left against a scanline on the right. A solution mapping the left image to the right image is represented by a single path through the image from the far left point on the left scanline to the far right point on the right scanline. The best path is the best match and is the best because it is the lowest cost.
There are three different path type possibilities. If your path travels directly down the diagonal then it’s a one for one match! The disparity doesn’t change and if the disparity at the farthest left part is +3 then it is +3 throughout.
If, like in the image, a range of the y axis is mapped to a single x value that is a left occlusion while if a range of x is mapped to a singular y that is a right occlusion.
The goal is to find the least cost through. Going diagonally the cost is a match between two pixels; when navgating around occlusion you must pay some extra cost. There are various forms of solving this but we will solve it with dynamic programming. You have to be able to compute the cost of traveling horizontally, vertically, or diagonally.
Looking at the sample image agan we see that we have a much better representation of depth and occlusion. There are some artifacts and there is streaking because each scanline is done independently. You cannot use dynamic programming to find a spatially coherent set of disparities over a 2D grid. Other ways exst.
What defines a good stereo correspondence?
We might want to penalize something that has lots of jumps.
This example solves via energy minimization. The image on the right is the disparity and the energy terms are:
Find a set of coeffcients to minimize the disparity funciton.
Energy functions of this form can be minimized using graph cuts.
It works pretty well, but it’s not without issues.
There are some challenges wth stereo, like textureless areas, occlusions, and violations of brightness constancy, and really large baselines. Camera calibration issues are bad for epipolar lines, too!